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So...GPU virtualization 


GPU Accelerated Tasks 


Games 
Video Playback/Edit 
Web Experience 


Office Productivity 


User Interface 


Computer Aided Design 


Weather broadcast 
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Topics 


* Intel GPU virtualization approaches 
* Alittle history of GVT-g project 

* VFIO with mediated device 

* GVT-g device model 


Intel® Open Source Technology 


Center 


Intel GPU virtualization approaches 


DirectX* APIs OpenGL* APIs 


Graphics Driver 
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Pros: 
* Performance 
* Sharing 


Cons: 
* No media/GPGPU 
* Lacks compatibility 


Pros: 
* Performance 
* Capability 


Cons: 
* No sharing 
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Pros: 
* Performance 


* Capability 
* Sharing 
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GVT-g history 


2011-2012 Project Start: First XenGT POC done in 2012 on Sandybridge 
2013 Work with Citrix for XenGT production, Haswell support 
2014 Add KVMGT support, Broadwell support 
2015 GVT device model rewrite for upstream i915, Skylake support 
2016 Engage in VFIO/mdev model, KVMGT upstream merged at 4.10! 
2017 ê Kabylake and more features support, XenGT upstream ongoing 
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GVT-g architecture overview 


Host Linux 


Libvirt Pass vGPU UUID Qemu 


Create vGPU Get vGPU 
device info 


Guest OS 


GFX 
Driver 


Emulation services 
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VFIO 


e Original "Virtual Function I/O” > “Versatile Framework for userspace I/O” 
-  VFIO is a secure, userspace driver framework 
- IOMMU-based DMA mapping and isolation (iommu_group) 
- Full device access (MMIO, I/O port, PCI config) 
- Used for physical device assign to VM 
* now for virtual device assignment 
* Device assignment = userspace driver 
- Access to device resources 
- Isolation and secure DMA mapping through an IOMMU 


- Interrupt signaling support 
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VFIO resource access 


* Divided into regions with index (prog-if NGA control ler) ee eae re 
* Each region maps to a device resource mir "Mere Dustastore ENE 
-  MMIO Bar, IO Bar, PCI config space PE 
* Region count and info discovered through md. tn NÎÊLN Ê ı»®Fqç 
ioctl be 


-  VFIO DEVICE GET REGION INFO 
* Fast "mmap”, slow “read/write” 


Zz 
e Access Path S 
- Trapped by Hypervisor (QEMU/KVM) o 
- MemoryRegion lookup performed ad 
- MemoryRegion.{read,write } L 
accessors called ü > ilenicet > 


-  Read/write VFIO region offsets 
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Mediated device framework 


e Co-work from NVidia, Redhat, Intel, IBM 
* Represent virtual device to userspace via VFIO interface 
e Virtual device access is handled by vendor-specific driver to mediate resource sharing 


mdev_register_driver() 


1 
VEIO mdev * > VFIO user 
ål api 


Probe()/remove() 


Mdev core 
module mdev_register_device() 

ra » kvmgt.ko €» 1915/gvt <» GPU 
Physical callbacks hw 
Device 


interface a nvidia.ko 
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Mdev create & assignment 


e Vendor driver register device 
- mdev create: create virtual device 
- mdev destroy 
- mdev supported types: typed mdev configuration 
e VGPU types base on memory/fence/resolution configs 
e UUID based device node: 
- Jsys/bus/mdev/devices/$UUID/ 


e Get VFIO device file descriptor and present to VM 
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MDev resource access 


e Get region info via VFIO ioctl from vendor driver Host Linux 


Guest OS 


* Guest MMIO access trapped by KVM 


Create vGPU Get vGPU 


e KVM forward to QEMU VFIO driver device info 


* Convert to R/W request on VFIO device regions 


e Handled by mediated vendor driver in kernel 


Emulation services 


Fem 
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MDev DMA 


e QEMU setup guests memory 
e VFIO_MAP_DMA with {GFN, VA} 


e Vendor driver call VFIO pin pages to get PFN 


-  VFIO keep reference counted pinned <IOVA, PFN> 
e Vendor driver call dma map API for plOVA 
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MDev interrupt 


e  QEMU setup KVM irqfd 
e  QEMU notify vendor driver with irqfd via VFIO interface 


-  VFIO DEVICE SET IRQS 
e Vendor driver inject interrupt by signaling on eventfd 


- Directly inject into VM 


Intel® Open Source Technology 


Center 


GVT-g device model 


mdev ...  mdev 
gvt vGPU ; vGPU 
state state user nini 
vGPU access/pin/unpin 
MMIO GTT Shadow scheduler ~~ 
handler balloon mm Irg injection 
FER P VFIOIKVM 
p Write protect 
Virtual tracking 
display -¬« > 
Resource Request Request 
alloc submission done 
i915 < > GPU HW 
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GVT-g device model 


* where GPU virtualization logic actually lives 
- virtual GPU state maintenance for VM 
- MMIO handler to emulate HW access behavior for guest driver 


- GPU workload submission emulation and VM notification 
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vGPU memory manage 


Global Page Table * Global graphics memory is partitioned 
hat eae ad (VM2) 


PN e Ballooned space 
TT * Aperture access without trap 


Host Memory 


Shadow 
Global Page Table 


Page Directory Table (PDEs) Page Table (PTEs) 


* Fully shadowed PPGTT 


e Use KVM guest page write 
protect for page table 
update tracking Shadow Page Directory Table (PDEs) Shadow Page Table (PTEs) 
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VGPU workload execution 


vGPU shadow context 
Virtualized execlist interface 


Command parser on vGPU ring/privileged 
buffer 


- Emulated user interrupt 


(1915 i915) | 
. request | | request | 


Host i915 Scheduling 


Dispatch i915 request 


4 Scheduling events 
GVT-g workload scheduling 


f vGPU ` ( | 


vGPU 
M Workload . inc 
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VGPU scheduling 


e vGPU instance: time based scheduling 
* Scheduled vGPU instance can submit requests to all engines 


-  Per-engine work thread 
* Scheduler policy based on vGPU weight 
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VGPU full virtualized display 


Render Engine 
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HOWTO 


* Kernel config (>= 4.10) 
- CONFIG VFIO, CONFIG VFIO MDEV, CONFIG VFIO MDEV DEVICE 
- CONFIG DRM 1915 GVT, CONFIG DRM 1915 GVT KVMGT 
- j915.enable gvt=1 

* Create mdev (vGPU) 


- “uuid > 
/sys/devices/pci0000:00/0000:00:02.0/mdev supported types/i915- 
GVTg V5 4/create" 


* Start VM 


-  "qemu-system-x86 64 -m 1024 -enable-kvm -device vfio- 
pci, sysfsdev=/sys/bus/pci/devices/0000:00:02.0/$UUID...” 


* Detailed HOWTO 
-  https://github.com/O1org/gvt-linux/wiki 
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Current upstream status 


e KVMGT fully support in upstream, kernel >=4.10, qemu 


* Support Broadwell/Skylake/Kabylake for Linux (includes Android) guest (kernel >= 4.8) 
and Windows guest 


* All kinds of GPU applications are supported in guest 


- Although some media features missed for GuC/HuC firmware support 
* MTBF time (1 Windows VM): more than 1 week 
e Performance (Media workload) 


- Peak perf 95% of native host (1VM) 


- Reach average over 8596 performance of native (1 VM run) 
e Links: 


-  https://github.com/O1org/gvt-linux.git 
-  https://01.org/igvt-g 
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